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ABSTRACT. In microbial ecology studies, the most commonly used 
ways of investigating alpha (within-sample) diversity are either 
to apply count-only measures such as Simpson's index to Opera- 
tional Taxonomic Unit (OTU) groupings, or to use classical phy- 
logenetic diversity (PD), which is not abundance-weighted. Al- 
though alpha diversity measures that use abundance information 
in a phylogenetic framework do exist, but are not widely used 
within the microbial ecology community. The performance of 
abundance-weighted phylogenetic diversity measures compared 
to classical discrete measures has not been explored, and the be- 
havior of these measures under rarefaction (sub-sampling) is not 
yet clear. In this paper we compare the ability of various alpha 
diversity measures to distinguish between different community 
states in the human microbiome for three different data sets. We 
also present and compare a novel one-parameter family of alpha 
diversity measures, BWPD e , that interpolates between classical 
phylogenetic diversity (PD) and an abundance-weighted exten- 
sion of PD. Additionally, we examine the sensitivity of these phy- 
logenetic diversity measures to sampling, via computational ex- 
periments and by deriving a closed form solution for the expecta- 
tion of phylogenetic quadratic entropy under re-sampling. In all 
three of the datasets considered, an abundance-weighted measure 
is the best differentiator between community states. OTU-based 
measures, on the other hand, are less effective in distinguishing 
community types. In addition, abundance- weighted phylogenetic 
diversity measures are less sensitive to differing sampling inten- 
sity than their unweighted counterparts. Based on these results 
we encourage the use of abundance-weighted phylogenetic di- 
versity measures, especially for cases such as microbial ecology 
where species delimitation is difficult. 



1. INTRODUCTION 

It is now well accepted that incorporating phylogenetic informa- 
tion into alpha (single-sample) and beta (between-sample) diversity 
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measures can be useful in a variety of ecological contexts. Phylo- 
genetic equivalents of all of major alpha diversity measures have 
been developed. Starting with Faith's original definition of phyloge- 
netic diversity (Faith, 1992), which generalizes species count, there 
are now phylogenetic generalizations of the Simpson index to Rao's 
quadratic entropy (Rao, 1982; Warwick and Clarke, 1995), the Shan- 
non index to phylogenetic entropy (Allen et al., 2009), and the Hill 
numbers to 9 D(T) (Chao et al., 2010). Phylogenetic diversity itself 
has been extended to incorporate taxon counts (Barker, 2002) and 
proportional abundance (Vellend et al., 2011). There have also been 
abundance-weighted measures that explicitly measure phylogenetic 
community structure (Fine and Kembel, 2011), or an "effective num- 
ber of species" (Chao et al., 2010). Many diversity measures can be 
tidily expressed in the framework of Leinster and Cobbold (2012), 
although the expression of phylogenetic diversity measures for non- 
ultrametric trees is complex. 

In this paper we use three example human microbiome datasets 
to demonstrate the utility of abundance-weighted phylogenetic di- 
versity measures. We also introduce a one-parameter family inter- 
polating between classical PD and an abundance-weighted gener- 
alization. We call the parameter 9 and denote the one-parameter 
family BWPD e ; BWPD is classical PD, whereas BWPDi is balance- 
weighted phylogenetic diversity, effectively PD aw of Vellend et al. 
(2011). Intermediate values of 6 allow a partially-abundance-weighted 
compromise. Such a compromise has recently been shown to be 
useful for measuring beta diversity, with the introduction of a one- 
parameter family of "generalized UniFrac" measures (Chen et al., 
2012). We use the name Balance Weighted Phylogenetic Diversity as 
described below because there are a variety of abundance weighted 
phylogenetic diversity measures. We compare the behavior of PD 
measures, including BWPD e , under various levels of sampling us- 
ing theory and example data sets. 



2. Materials and methods 

2.1. Datasets. We apply the methods described below to three pre- 
viously described 16S rRNA surveys of the human microbiome. The 
first two datasets are composed of samples from "normal" and dys- 
biotic microbial communities, where previous studies have associ- 
ated changes in diversity with changes in health. The third dataset 
investigates the changes of the skin microbiome through time. 
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2.1.1. Bacterial vaginosis. First, we reanalyze a pyrosequencing dataset 
describing bacterial communities from women being monitored in a 
sexually transmitted disease clinic for bacterial vaginosis (BV). BV 
has previously been shown to be associated with increased commu- 
nity diversity (Fredricks et al., 2005). For this study swabs were 
taken from 242 women from the Public Health, Seattle and King 
County Sexually Transmitted Diseases Clinic between September 2006 
and June 2010 of which 220 samples resulted in enough material to 
analyze (Srinivasan et al, 2012). 

Selection of reference sequences and sequence preprocessing were 
performed using the methods described in (Srinivasan et al., 2012). 
452,358 reads passed quality filtering, with a median of 1,779 reads 
per sample (range: 523-2,366). 

2.1.2. Oral periodontitis. We also utilize sequence data from a study 
of subgingival communities in 29 subjects with periodontitis, along 
with an equal number of healthy controls (Griffen et al., 2011a). The 
publication analyzing this dataset showed increased community di- 
versity in samples from diseased patients compared to healthy con- 
trols. Raw sequences were filtered, retaining only those reads with: 
a mean quality score of at least 25, no ambiguous bases, at least 150 
base pairs in length, and an exact match to the sequencing primer 
and barcode. A total of 759,423 reads passed quality filtering, with a 
median of 8,320 reads per sample (range: 4,096-14,319). 

As the phylogenetic placement method used below to calculate 
our measures requires a reference tree and alignment, we created a 
tree with FastTree 2.1.4 (Price et al., 2010) using the alignment and 
accompanying taxonomic annotation from the curated CORE data- 
base of oral microbiota (Griffen et al., 2011b). 

2.1.3. Skin microbiome through time. Our third data set is a study of 
skin microbial diversity through adolescence Oh et al. (2012). Aligned 
sequences were obtained directly from the authors, although sequence 
data is available under the accession numbers [GQ000001] to [GQ116391] 
and can be accessed through BioProject ID 46333. 

2.2. Balance-weighted phylogenetic diversity. In this section we in- 
troduce BWPDe, our one-parameter family interpolating between 
classical PD and fully balance-weighted phylogenetic diversity. We 
will primarily consider so-called unrooted (Pardi and Goldman, 2007) 
phylogenetic diversity, which does not necessarily include the root. 
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The case of rooted phylogenetic diversity can be calculated in a sim- 
ilar though simpler way as described below. Although we will pri- 
marily be working in an unrooted sense, it will be useful to use ter- 
minology that corresponds to the rooted case. For this reason, if the 
tree is not already rooted, assume an arbitrary root has been chosen; 
let the proximal side of a given edge be the side that contains the root 
and distal be the other. 

We will describe BWPD# in terms of a phylogenetic tree T with 
leaves L, and a contingency table describing the number of observa- 
tions of the organisms at the leaves in various samples. The con- 
tingency table has rows labeled with the leaves of T, and columns 
labeled by samples. In microbial ecology this is frequently known as 
an OTU table. The entry corresponding to a given leaf and a given 
sample is the number of times that leaf was observed in that sample. 

The classical (unrooted) phylogenetic diversity of a given sample 
in this context is the total branch length of the tree subtended by the 
leaves in that sample. 

The path to generalizing PD is to note that this can be expressed 
as a sum of branch lengths multiplied by a step function. Let f(x) 
be the function that is one f or x > and zero otherwise. Let g(x) = 
min(/(a;), /(l — x)) and D s (i) be the fraction of reads in sample s that 
are in leaves on the distal side of edge %. Phylogenetic diversity can 
be then expressed as 

(1) PD u (a) = ^4 </(£>.(<)) 

i 

That is, the sum of edge lengths in T which have reads from s on 
both the distal and proximal side. 

Note that the step function g is the limit of a one-parameter family 
of functions (Fig. 1). Indeed, defining 

(2) g e {x) = min (x 9 , (1 - x) e ) , 

g is the pointwise limit on the closed unit interval of the g% as 9 goes 
to zero. Thus our one-parameter generalization is 

(3) BWPDe(s) = Y^e ig e(D a (i)). 

i 

Note that when 9 = this is PD and when 9 = 1 this is an abundance- 
weighted version of PD equivalent to executing the A nPD recipe of 
Barker (2002) up to a multiplicative factor. 
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FIGURE 1. <?§ curves for various parameters. As 9 
goes to zero, the ge converge pointwise to g, which is 1 
on the interior of the unit interval and on the bound- 
aries. 

The rooted equivalent of (3) is 

(4) RBWD ( S ) = ^(D s (z)) e , 

i 

which interpolates between rooted PD and an abundance-weighted 
version. Vellend et al. (2011) describe a similar measure, PD aw , which 
is equal to RBWPDx multiplied by the total number of branches in 
T. 

We call BWPDx balance-weighted phylogenetic diversity because 
it weights edges according to the balance of read fractions on either 
side of an edge- edges with even amount of mass on either side 
are up-weighted, while edges with an uneven balance of mass are 
down- weighted. Indeed, if \x — (1 — x) | is taken to represent the im- 
balance of read fraction on either side of an edge, then 1 — \x — (1 — x) \ 
can be taken to be a measure of balance; note that on the unit inter- 
val, min(x, 1 — x) = 1 — \x — (1 — x) |. Because a small x or an x close 
to 1 gives a small coefficient in the summation, small collections of 
reads or small perturbations of the read distribution will not change 
the value of BWPDi appreciably. 
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2.3. Calculation of PD measures in example applications. Reads 
from the vaginal and oral studies were placed on a tree created from 
a curated set of taxonomically annotated reference sequences. As 
phylogenetic entropy and 9 D(T) operate on a rooted phylogeny ref- 
erence trees were assigned a root taxonomically (Matsen and Gal- 
lagher, 2012). pplacer was run in posterior probability mode (using 
the -p and — informative-prior flags), which defines an infor- 
mative prior for pendant branch lengths with a mean derived from 
the average distances from the edge in question to the leaves of the 
tree. The resulting set of placements were classified at the family 
rank using a hybrid classifier implemented in the guppy tool from 
the pplacer suite. The hybrid classifier assigns taxonomic annota- 
tions to sequences using the combination of a na'ive Bayes classifier 
(Wang et al., 2007) with a phylogenetic classifier (Matsen et al., un- 
published results). Any reads that could not be confidently classified 
to the family rank were not used in measures based on classification. 

Full-length 16S sequences were available for the skin data, and so 
a more traditional tree-building approach was used. Representative 
OTUs were chosen for each site by clustering at 97% identity using 
USEARCH (Edgar, 2010), with trees built on OTU centroids using 
FastTree (Price et al., 2010). To conform with methods used in that 
paper, the na'ive Bayes classifier (Wang et al., 2007) was used to infer 
genus-level classifications to taxonomically root the tree; in our case 
we used the RDP classifier v2.5. The contingency (OTU) tables gen- 
erated by clustering were made available to our tools via the BIOM 
(McDonald et al., 2012) format. 

PD U (unrooted PD), phylogenetic quadratic entropy (Rao, 1982), 
phylogenetic entropy (Allen et al, 2009), and 9 D(T) (Chao et al, 
2010) were implemented for phylogenetic placements in the freely- 
available pplacer suite of tools (Matsen et al., 2010) (ht tp : / /mat sen . 
f here . org/pplacer) in the subcommand guppy fpd. PD U on rar- 
efied phylogenetic placements was calculated using guppy rarefy. 

Discrete measures of alpha diversity and richness were calculated 
on contingency tables obtained from clustering and taxonomic clas- 
sification. Sequences were clustered into Operational Taxonomic Units 
(OTUs) at a 97% identity threshold using USEARCH 5.1 (Edgar, 2010). 
Similar results were observed when clustering at 95% identity (re- 
sults not shown). OTU counts and family-level taxon counts were 
then rarefied to the read count of the specimen in the dataset with 
the fewest sequences in R 2.15.1 (R Development Core Team, 2012) 
using the vegan package (Oksanen et al., 2012). We obtained values 
for the Simpson (1949) and Shannon (1948) diversity indices, as well 
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as the Chaol (Chao, 1984) and ACE (Chao and Lee, 1992) measures 
of species richness using vegan functions diversity and estimateR. 

2.4. Comparative analysis of alpha diversity measures. To inves- 
tigate the relation between various measures of alpha diversity, we 
calculated Pearson's r between all pairs of measures using the func- 
tion rcorr from the R package Hmisc (Harrell Jr., 2012). We then 
performed hierarchical clustering with the R function hclust, using 
d = 1 — r as the distance between two measures. 

Association of each measure with clinical criteria for the first two 
data sets was evaluated by examining the accuracy of a logistic re- 
gression using the measure as the sole predictor of whether the sam- 
ple came from a "normal" or dysbiotic subject. In the vaginal dataset, 
we assessed each measure's ability to predict whether a sample was 
from a subject positive for BV by Amsel's criteria, a clinical diagnos- 
tic method (Amsel et al., 1983). In the oral dataset, we assessed each 
measure's ability to predict whether a sample was from a healthy 
control, or a subject with periodontitis. Accuracy in predicting sam- 
ple community state was assessed by leave-one-out cross-validation 
using the R package boot (Davison and Hinkley, 1997; Canty and 
Ripley, 2012). 

For the vaginal dataset, we also calculated R 2 values using each 
measure individually as a predictor for sample Nugent score in a 
linear regression. The Nugent score provides a diagnostic score for 
BV which ranges from (BV-negative) to 10 (BV-positive) based on 
presence and absence of bacterial morphotypes as viewed under a 
microscope (Nugent et al., 1991). 

We calculated p-values to compare within- and between-stratification 
variability using R's built-in t.test function for the vaginal data, which 
had a binary stratification, and aov function for the oral and vaginal 
data sets. The vaginal dataset data was stratified by Amsel's crite- 
rion, the oral dataset by condition and sampling site, and the skin 
microbiome dataset by Tanner scale of physical development (Oh 
etal.,2012). 

Plots were prepared with R base graphics and ggplot2 (Wickham, 
2009). 

2.5. Evaluation of performance under rarefaction. Phylogenetic place- 
ments were rarefied using the rarefy subcommand of the guppy tool 

in the pplacer suite. Phylogenetic alpha diversity measures were cal- 
culated on the resulting rarefied placements as described above. 
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3. Results 

3.1. Application to the human microbiome. 

3.1.1. Vaginal microbiome. Like Srinivasan et al. (2012) and many oth- 
ers in the field, we observe greater diversity in BV positive speci- 
mens using a variety of diversity and richness measures (Fig. ??). In 
particular, this is true for BWPD^ for a variety of values of 6 (Fig. ??). 
In the vaginal data, phylogenetic measures of alpha diversity have 



Measure 


Nugent R 2 


Amsel Accuracy 


Amsel p-value 


BWPDq.25 


0.738 


0.828 


1.49E-35 


Simpson (Family) 


0.731 


0.822 


2.07E-33 


Rarefied PD U 


0.731 


0.828 


6.81E-35 


Shannon (Family) 


0.721 


0.821 


8.85E-33 


BWPDq.5 


0.703 


0.823 


2.16E-33 


PD U 


0.696 


0.832 


1.26E-32 


Phylo. entropy 


0.679 


0.832 


1.56E-31 


a25 D(T) 


0.677 


0.818 


8.74E-30 


a5 D(T) 


0.662 


0.814 


5.35E-29 


Phylo. quad, entropy 


0.647 


0.811 


7.70E-30 


BWPDi 


0.610 


0.796 


5.66E-28 


Chaol (Family) 


0.610 


0.823 


9.79E-24 


Chaol (OTU) 


0.450 


0.758 


1.61E-19 


ACE (OTU) 


0.422 


0.763 


6.86E-20 


Shannon (OTU) 


0.380 


0.754 


6.73E-16 


Simpson (OTU) 


0.192 


0.700 


1.36E-07 


ACE (Family) 


0.088 


0.666 


1.51E-01 



TABLE 1 . Correlation and predictive performance of 
the various alpha diversity measures, ordered by de- 
creasing R 2 value. Nugent R 2 : R 2 value using the mea- 
sure as a predictor, and the Nugent score as response 
in a linear model. Amsel accuracy: proportion of spec- 
imens with correct BV classification under a leave-one- 
out cross-validation. Amsel p-value: p-value from a 
two-sample t-test on values stratified by BV classifica- 
tion. "OTU" designates the measure applied to 97% 
clustering groups, and "Family" designates taxonomic 
classification at the family level. Measures described 
in main text. 
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FIGURE 2. Dendrogram relating alpha diversity mea- 
sures applied to the vaginal dataset. 

better cross-validation accuracy for the Amsel classification and bet- 
ter correlation with the Nugent score than discrete OTU-based mea- 
sures (Table 1). All measures were somewhat accurate in identifying 
community state, with even the worst performers classifying at least 
70% of samples correctly BWPD 02 5, rarefied PD U , PD U , and phylo- 
genetic entropy perform equally well predicting BV status. Corre- 
lation with Nugent score varies more widely, from 0.19 using Simp- 
son (OTU) to 0.74 using BWPD0.25 or Simpson applied to family- 
level classifications. OTU-based measures rank in the bottom half 
of the measures tested, and below all phylogenetic measures. Phy- 
logenetic diversity, which can be viewed as a measure of richness, 
outperforms discrete measures of richness, and most measures in- 
corporating abundance. 

In the hierarchical clustering of alpha measures on the vaginal 
data set, phylogenetic methods are separated from OTU-based meth- 
ods (Fig. 2). BWPDg is similar to different extant phylogenetic alpha 
diversity measures for different 9. The Simpson and Shannon diver- 
sity measures cluster together, as do the ACE and Chaol richness 
measures. 

Fig. 3 shows values of BWPD e calculated before (x-axis) and after 
(y-axis) a single rarefaction to 523 sequences per sample. Samples 
for which the BWPD e value changes little lie close to the blue line, 
which shows the case of no difference between original and rarefied 
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FIGURE 3. Comparison of rarefied and unrarefied val- 
ues of various phylogenetic alpha diversity measures 
as applied to the vaginal dataset. The value of six al- 
pha measures for each specimen using all available se- 
quences is plotted on the x-axis. The value of the alpha 
measures for each specimen after a single rarefaction 
to 523 sequences (the smallest sequence count across 
specimens) is plotted on the y-axis. The y = x line is 
shown in blue. 

samples. Increasing 9, which corresponds to increased use of abun- 
dance information, reduces the change in BWPDg induced by rar- 
efaction. Phylogenetic quadratic entropy and phylogenetic entropy 
both show behavior similar to BWPDi, with rarefaction introducing 
little effect. 

It might be possible to formalize a statement to this effect by com- 
puting the expectation of these alpha measures under rarefaction. 
However, computing the expectation for BWPD e under rarefaction 
does not appear to be straightforward: the methods of Dremin (1994) 
might be applicable in this setting, however, even the integer mo- 
ments of the hypergeometric distribution are complicated and the 
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non-integer moments are bound to be very complex. We have, how- 
ever, shown in the Appendix that the expectation of phylogenetic 
quadratic entropy under rarefaction to k sequences assigned to the 
tips of a phylogenetic tree is 



where d; L is the number of sequences falling below edge i and £i is 
the length of edge i. This is almost identical to the unrarefied value 
of phylogenetic quadratic entropy, i.e. 



Thus it is not surprising to see that the expectation of PQE under 
rarefaction is very close to the original value (Fig. ??) for reasonably 
large k and n. 

3.1.2. Oral microbiome. As previously observed by Griffen et al. (2011a), 
we find generally higher diversity in samples from diseased patients 
(Fig. 4). We evaluated the ability of each alpha diversity measure to 
predict whether a sample came from an individual with periodonti- 
tis, regardless of sample collection site, using the above methods. 

In the oral dataset, phylogenetic alpha diversity measures incor- 
porating abundance gave the best predictions of community state 
(Table ??, Fig. 4). In contrast, classical phylogenetic diversity was 
amongst the worst predictors; rarefaction did help, but rarefied PD 
still performed worse than phylogenetic measures taking abundance 
into account. 

OTU-based methods and phylogenetic methods are not as sepa- 
rated in a hierarchical clustering as for the vaginal dataset (Fig. ??). 
However, many of the same pairings are present in both clusterings: 
BWPD0.5 with PE, BWPD! with QE, Simpson with Shannon, ACE 
with Chaol, and PD U with rarefied PD. Interestingly, PD U , rarefied 
PD U , and BWPD0.25 all cluster with the discrete richness measures 
ACE and Chaol. 

Like the vaginal dataset, incorporating abundance information de- 
creases the effect of rarefaction on BWPD e values (Figs. ??, ??). 



E[PQE fc ] 



k- 1 



y^/idijn - di) 



kn(n — 1) 




3.1.3. Skin microbiome. To further assess resolution and robustness 
of abundance weighted phylogenetic diversity measures, we con- 
sidered skin microbiome data from a study by Oh et al. (2012). This 
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FIGURE 4. Comparison of diversity between samples 
from healthy controls, healthy sites of diseased pa- 
tients, and diseased sites of diseased patients in the 
oral dataset, using different measures of alpha diver- 
sity. Top row: cluster-based methods. Bottom row: 
phylogenetic methods. 



study tracked the changes of the skin microbiome through devel- 
opmental stages. Because there are five Tanner stages, and they do 
not have a monotonic relationship with skin microbiome diversity 
(Oh et al., 2012), we focused on ANOVA p-values to see if the diver- 
sity measurements had small within-stage heterogeneity compared 
to between-stage heterogeneity. To compare the ANOVA p-values 
associated with the diversity measurements across the various data 
sets, we ranked the p-value of the diversity measures from lowest 
to highest for each data set individually. We averaged these ranks 
to gain an overall measure of performance. The results again show 
phylogenetic measures generally performing better than OTU-based 
measures (Tab. 5). In this case, a light weighting or no weighting 
of phylogenetic diversity by abundance performed better than full 
abundance-weighting. Note that we are not presenting these un- 
corrected p-values as evidence that there is an interesting relation- 
ship between skin microbiome and developmental stage, but rather 
are using p-values as a way of measuring within-stage heterogeneity 
compared to between-stage heterogeneity for the various measures. 
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Ac 




N 




Pc 




Vf 


mean rank 


BWPDo.25 


2.90e- 


-02 


1.29e- 


-03 


4.94e- 


-03 


1.71e- 


-04 


3.50 


PD U 


2.72e- 


-02 


5.48e- 


-03 


8.62e- 


-03 


5.06e- 


-04 


5.25 


°- 25 D(T) 


2.95e- 


-02 


1.24e- 


-03 


1.53e- 


-02 


3.86e- 


-04 


5.50 


a5 D(T) 


3.03e- 


-02 


4.95e- 


-04 


2.92e- 


-02 


3.20e- 


-04 


5.75 


BWPDq.5 


6.53e- 


-02 


7.34e- 


-05 


1.43e- 


-02 


1.42e- 


-03 


6.75 


°D(T) 


3.08e- 


-02 


2.37e- 


-03 


9.77e- 


-03 


8.88e- 


-04 


7.00 


Chaol (OTU) 


2.97e- 


-02 


2.68e- 


-03 


9.14e- 


-03 


9.97e- 


-03 


7.00 


Shannon (OTU) 


7.09e- 


-02 


8.48e- 


-02 


1.23e- 


-01 


2.70e- 


-05 


8.00 


Phylo. entropy 


1.17e- 


-01 


2.31e- 


-05 


8.37e- 


-02 


1.03e- 


-02 


8.25 


Phylo. quad, entropy 


2.52e- 


-01 


6.31e- 


-06 


4.77e- 


-01 


1.55e- 


-01 


9.00 


Simpson (OTU) 


1.17e- 


-01 


3.68e- 


-01 


8.75e- 


-01 


1.15e- 


-04 


9.50 


BWPDi 


3.11e- 


-01 


2.99e- 


-05 


6.45e- 


-01 


5.33e- 


-01 


10.25 



TABLE 2. ANOVA p-values for various phylogenetic 
diversity statistics applied to the skin microbiome data 
of Oh et al. (2012). Rows are ordered by increasing 
mean rank across sites. The same site abbreviations are 
used as in their paper: Af, antecubital fossa; N, nares; 
Pf, popliteal fossa; Vf, volar forearm. 



3.1.4. Applications summary. In all three of the data sets investigated, 
abundance-weighted phylogenetic diversity measures showed good 
performance to distinguish between community states: between "nor- 
mal" and dysbiotic samples in the oral and vaginal microbiomes, 
and between developmental stages in the skin microbiome. Notably, 
the best distinguishing measure in each dataset was both phyloge- 
netic and abundance-weighted. BWPD#, our new family of abundance- 
weighted phylogenetic diversity measures, was highly correlated with 
clinical status although the value of 6 most associated with commu- 
nity state varied. On the vaginal and oral data sets intermediate val- 
ues of 9 for BWPDfii provide the best correlation with clinical sta- 
tus. These results correspond to analogous results for beta diversity, 
where an intermediate exponent for "generalized UniFrac" was the 
most powerful (Chen et al., 2012). 
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4. Discussion 

Phylogenetic alpha diversity measures were more closely related 
to community state than were discrete measures based on OTU clus- 
tering for the data sets investigated here. This result is especially in- 
teresting given that the Simpson index, the Shannon index, or count- 
ing applied to OTU tables are very common ways of characterizing 
microbial diversity (Fierer et al., 2007; Grice et al., 2009; Hill et al., 
2003; Dethlefsen and Relman, 2011). As also noted by Aagaard et al. 
(2012), we find that measurements of diversity using taxonomic clas- 
sification can be useful in describing communities, and in fact per- 
form much better than the same measurements of diversity applied 
to OTU counts; however, this approach requires a taxonomically 
well characterized environment. Our results can be viewed as an 
experimental confirmation of the notion that incorporating similar- 
ity between species is important to get sensible measures of diver- 
sity, which has been advocated by many, including most recently by 
Leinster and Cobbold (2012). 

We find that classical phylogenetic diversity is sensitive to sam- 
pling depth, underestimating the true value in small samples. Biases 
have also been described for diversity measures using OTU tables 
(Gihring et al., 2012). In contrast, we observe that some abundance- 
weighted phylogenetic measures are relatively robust to varying lev- 
els of sampling. 

As of the publication of this paper, no abundance-weighted phylo- 
genetic alpha diversity measures are implemented in either mothur 
(Schloss et al., 2009) or QIIME (Caporaso et al., 2010), two of the most 
popular tools for analysis of microbial ecology data. Although the 
fact that abundance-weighted phylogenetic diversity measures per- 
formed best for the three data sets investigated here does not imply 
that they are best in general, we suggest that abundance-weighted 
phylogenetic measures be given greater consideration for microbial 
ecology studies. For this to happen, implementations in commonly 
used microbial ecology software packages will be needed, in addi- 
tion to our implementation and that of the picante R package (Kem- 
bel et al, 2010). 
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1. Appendix 



1.1. Rarefaction of phylogenetic quadratic entropy. We investigate the rar- 
efaction of phylogenetic quadratic entropy (PQE), which is a diversity coefficient 
in the language of (Rao, 1982) and defined as follows (Warwick and Clarke, 1995; 
Allen et al., 2009). If the tree is not rooted, root it arbitrarily (the rooting does 
not impact the value). PQE is defined on a tree as 

M KJB, 

i 

where li is the length of edge i and ai is the number of leaf observations that are 
distal (away from root) from edge i. 

Assume we rarefy to k observations as above; let A^ denote the random vari- 
able that is the number of observations distal to edge i after rarefaction. The 
phylogenetic quadratic entropy is then 

The random variable Ai has a hypergeometric distribution, performing k draws 
with di possible successes in a population of size n. Let m be the expectation of 
Ai, which is simply kdi/n. The variance of the hypergeometric distribution is well 
known to be 

2 kdi(n - di)(n - k) 
(3) v l = 



n 2 {n — 1) 



Next 



k \ k 



E[PQE fc ] =£>E 

i 



By definition, 



E[A*]=tf+al 

Thus the expectation of the phylogenetic quadratic entropy upon rarefaction is 
(4) E[PQE fe ] = p^^(fc Mi -^ 2 _ (J 2 )> 

i 

Expanding the term in parentheses from (4): 

2 2_ u 2 d i i,2 d i kd t (n ~ di)(n - k) 

KHi ~ Mi a i — K K 
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= kd, 
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Putting this back in (4), we obtain 



(5) 



E[PQE fe ] = 



k-l 



'^2e i d i (n - di) 



kn(n — 1) 



In principle one could calculate the variance of phylogenetic quadratic entropy 
in terms of the higher order moments of the hypcrgeometric distribution. However, 
these higher moments are very messy and we have not attempted to write out the 
variance calculation. We also note that this derivation could be easily generalized 
to the setting of a "tree with marks" as in (Nipperess and Matsen, 2012). 

1.2. Description of PD in more general setting. We can describe the methods 
in the general setting where samples are represented by a mass distribution on a 
tree. As described elsewhere (Evans and Matsen, 2012), this generalizes the notion 
of representing a sample by an OTU count equipped with a phylogenetic tree on 
OTU representative. Specifically, if the total sample size is N, then n observations 
of a given OTU uj are represented by a point mass of weight n/N at ui. 

As observed by others (Allen et al., 2009) phylogenetic diversity measures can 
be written as 



where F is some real-valued function on the unit interval. This can be further 
generalized to the case of an abitrary probability distribution by writing this as an 
integral where A is the length measure on the tree (Evans and Matsen, 2012) and 
now D s (y) is the total mass on the distal side of y. 



For phylogenetic quadratic diversity, F{x) = x(l —x), and for phylogenetic entropy, 
F(x) — — x\ogx. As described above, the BWPDg fits into this framework with 
F{x) = min( 5e (x),5e(l - a;)). 
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Figure SI. Comparison of diversity between samples from BV 
negative and BV positive women, using different measures of al- 
pha diversity. Top row: cluster-based methods. Bottom rows: 
phylogenetic methods. 
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Figure S2. Values of BWPDg for various 9. Each line represents 
a specimen from the vaginal dataset. Lines are colored by Nugent 
score. A Nugent score of 7-10 is consistent with bacterial vaginosis. 
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ANOVA p-value 


Phylogenetic entropy 
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Phylogenetic quadratic entropy 
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1.09E-07 
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0.734 


6.04E-05 
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3.56E-04 


Rarehed PD U 
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2.47E-01 


PD U 


0.664 


2.60E-05 
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1.91E-01 


ACE (OTU) 


0.636 


4.21E-03 



Table S 1 . Predictive accuracy of each measure in the oral dataset 
and p-value from an ANOVA stratified by disease status and sam- 
pling site 
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Figure S3. Comparison of phylogenetic quadratic entropy (PQE) 
calculated on all sequences from the vaginal dataset to the expec- 
tation of PQE expectation under rarefaction to 523 sequences per 
specimen (the smallest sequence count across all specimens) com- 
puted via our analytical formula. The y = x line is shown in blue. 
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Figure S4. Comparison of rarefied and unrarefied values of var- 
ious phylogenetic alpha diversity measures as applied to the oral 
dataset. The value of six alpha measures for each specimen us- 
ing all available sequences is plotted on the x-axis. The value of 
the alpha measure for each specimen after a single rarefaction to 
4,096 sequences (the smallest sequence count across specimens) is 
plotted on the y-aods. The y — x line is shown in blue. 
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Figure S5. Comparison of phylogenetic quadratic entropy (PQE) 
calculated on all sequences from the oral dataset to the 
analytically-derived expectation of PQE under rarefaction to 4096 
sequences per specimen (the smallest sequence count across all 
specimens) computed via our analytical formula. The y = x line 
is shown in blue. 
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Figure S6. Dendrogram relating alpha diversity measures 
plied to the oral dataset. 



